Search CORE

7 research outputs found

GraphX: Unifying Data-Parallel and Graph-Parallel Analytics

Author: Crankshaw Daniel
Dave Ankur
Franklin Michael J.
Gonzalez Joseph E.
Stoica Ion
Xin Reynold S.
Publication venue
Publication date: 11/02/2014
Field of study

From social networks to language modeling, the growing scale and importance of graph data has driven the development of numerous new graph-parallel systems (e.g., Pregel, GraphLab). By restricting the computation that can be expressed and introducing new techniques to partition and distribute the graph, these systems can efficiently execute iterative graph algorithms orders of magnitude faster than more general data-parallel systems. However, the same restrictions that enable the performance gains also make it difficult to express many of the important stages in a typical graph-analytics pipeline: constructing the graph, modifying its structure, or expressing computation that spans multiple graphs. As a consequence, existing graph analytics pipelines compose graph-parallel and data-parallel systems using external storage systems, leading to extensive data movement and complicated programming model. To address these challenges we introduce GraphX, a distributed graph computation framework that unifies graph-parallel and data-parallel computation. GraphX provides a small, core set of graph-parallel operators expressive enough to implement the Pregel and PowerGraph abstractions, yet simple enough to be cast in relational algebra. GraphX uses a collection of query optimization techniques such as automatic join rewrites to efficiently implement these graph-parallel operators. We evaluate GraphX on real-world graphs and workloads and demonstrate that GraphX achieves comparable performance as specialized graph computation systems, while outperforming them in end-to-end graph pipelines. Moreover, GraphX achieves a balance between expressiveness, performance, and ease of use

arXiv.org e-Print Archive

CiteSeerX

Delta lake: high-performance ACID table storage over cloud object stores

Author: Armbrust M. (Michael)
Boncz P.A. (Peter)
Das T. (Tathagata)
Ghodsi A. (Ali)
Hovell H. (Herman) van
Ionescu A. (Adrian)
Li X. (Xiao)
Mokthar M. (Mostafa)
Murthy M. (Mukul)
Paranjpye S. (Sameer)
Senster P. (Pieter)
Sun L. (Liwen)
Switakowski M.
Szafranski M. (Michal)
Torres J. (Joseph)
Ueshin T. (Takuya)
Xin R. (Reynold)
Yavuz B. (Burak)
Zaharia M. (Matei)
Zhu S. (Shixiong)
Łuszczak A.
Publication venue: 'VLDB Endowment'
Publication date: 14/09/2020
Field of study

Cloud object stores such as Amazon S3 are some of the largest and most cost-effective storage systems on the planet, making them an attractive target to store large data warehouses and data lakes. Unfortunately, their implementation as key-value stores makes it difficult to achieve ACID transactions and high performance: metadata operations such as listing objects are expensive, and consistency guarantees are limited. In this paper, we present Delta Lake, an open source ACID table storage layer over cloud object stores initially developed at Databricks. Delta Lake uses a transaction log that is compacted into Apache Parquet format to provide ACID properties, time travel, and significantly faster metadata operations for large tabular datasets (e.g., the ability to quickly search billions of table partitions for those relevant to a query). It also leverages this design to provide high-level features such as automatic data layout optimization, upserts, caching, and audit logs. Delta Lake tables can be accessed from Apache Spark, Hive, Presto, Redshift and other systems. Delta Lake is deployed at thousands of Databricks customers that process exabytes of data per day, with the largest instances managing exabyte-scale datasets and billions of objects

CWI's Institutional Repository

A survey of community search over big graphs

Author: A Angadi
A Broder
A Clauset
A Gibbons
A Lancichinetti
A Mehler
A Montresor
AE Sarıyüce
AE Sariyüce
Alessandro Acquisti
Alessia Amelio
AV Goldberg
B Adamcsek
B Balasundaram
B Yang
Bo Yang
C Shi
D Guo
D Wen
D-N Yang
E Akbas
E Galbrun
EA Leicht
F Bi
F Zhang
F Zhao
F-Y Wu
FD Malliaros
G Palla
G Rossetti
H Cheng
H Matsuda
J Cohen
J Elzinga
J Kim
J Lee
J Wang
J Yang
JE Gonzalez
Jeffrey Xu Yu
JR Ullmann
Jubin Edachery
K Macropol
K Mehlhorn
K Saito
L Chen
L Danon
L Kou
L Lai
L Lai
L Yuan
L Zou
Lijun Cai
Linlin Ding
Lu Qin
M Barthélemy
M Bazzi
M Coscia
M Girvan
M Kargar
M Qiao
MA Porter
Mauro Brunato
MB Hastings
ME Newman
ME Newman
Michel Plantié
Moses Charikar
MR Garey
N Armenatzoglou
N Barbieri
N Chiba
N Gulbahce
N Jayaram
N Wang
O Batarfi
P Expert
P Yi
Pascal Pons
Q Zhu
R Guimera
R-H Li
R-H Li
R-H Li
Reynold Cheng
S Fortunato
S Gregory
S Harenberg
S Papadopoulos
S. Parthasarathy
SB Seidman
SB Seidman
U Von Luxburg
W Fan
W Fan
W Khaouid
Wenjie Zhang
X Hu
X Huang
X Huang
X Huang
X Ning
Xin Huang
Xuemin Lin
Y Chen
Y Chen
Y Fang
Y Fang
Y Fang
Y Fang
Y Fang
Y Fang
Y Fang
Y Kim
Y Wang
Y Wu
Y Yuan
Y Yuan
Y Yuan
Y Yuan
Y Zhou
Ying Zhang
Yixiang Fang
Yixiang Fang
Z Li
Z Zheng
Zhana Kuncheva
Publication venue: 'Springer Science and Business Media LLC'
Publication date
Field of study

Crossref